Log-Linear Models for Word Alignment
نویسندگان
چکیده
We present a framework for word alignment based on log-linear models. All knowledge sources are treated as feature functions, which depend on the source langauge sentence, the target language sentence and possible additional variables. Log-linear models allow statistical alignment models to be easily extended by incorporating syntactic information. In this paper, we use IBM Model 3 alignment probabilities, POS correspondence, and bilingual dictionary coverage as features. Our experiments show that log-linear models significantly outperform IBM translation models.
منابع مشابه
Unsupervised Word Alignment with Arbitrary Features
We introduce a discriminatively trained, globally normalized, log-linear variant of the lexical translation models proposed by Brown et al. (1993). In our model, arbitrary, nonindependent features may be freely incorporated, thereby overcoming the inherent limitation of generative models, which require that features be sensitive to the conditional independencies of the generative process. Howev...
متن کاملContrastive Unsupervised Word Alignment with Non-Local Features
Word alignment is an important natural language processing task that indicates the correspondence between natural languages. Recently, unsupervised learning of log-linear models for word alignment has received considerable attention as it combines the merits of generative and discriminative approaches. However, a major challenge still remains: it is intractable to calculate the expectations of ...
متن کاملLog-linear Models for Uyghur Segmentation in Spoken Language Translation
To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random fiel...
متن کاملAlignment templates: the RWTH SMT system
In this paper, we describe the RWTH statistical machine translation (SMT) system which is based on log-linear model combination. All knowledge sources are treated as feature functions which depend on the source language sentence, the target language sentence and possible hidden variables. The main feature of our approach are the alignment templates which take shallow phrase structures into acco...
متن کاملLexicon models for hierarchical phrase-based machine translation
In this paper, we investigate lexicon models for hierarchical phrase-based statistical machine translation. We study five types of lexicon models: a model which is extracted from word-aligned training data and—given the word alignment matrix—relies on pure relative frequencies [1]; the IBM model 1 lexicon [2]; a regularized version of IBM model 1; a triplet lexicon model variant [3]; and a disc...
متن کامل